Can be found at “./R/Eades_Final_Clean.R”
My original question was “Does a player’s batting average affect their performance in the field, specifically their fielding percentage?” I will first show you the analyses for this question. After which, I will run additional models on statistics that to see what is the most significant relationship. More details to come.
This first data set, df1, is comprised of three varilables and eleven observations. These three variables include each unique year (2010-2020), the overall average batting average for each year, and the overall average fielding percentage for each year.
This plot is very informative to answering my question that sparked this project.
I was predicting that these graphs would look similar to each other, meaning as one increased so would the other. But as you can see, they are almost exactly opposite of each other. Even with this seemingly negative relationship, models show that the impact they have on each other is not significant.
## Df Sum Sq Mean Sq F value Pr(>F)
## AvgBA 1 1.293e-06 1.293e-06 3.467 0.0955 .
## Residuals 9 3.356e-06 3.729e-07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
My conclusion for my original question is that batting average and fielding percentage in Major League Baseball do not influence each other, as the p-value is greater than .05.
In order to gain the most accurate results, I need to include only players who have played enough games to impact their data. I chose 90 games which still gives me a huge data frame to work with. FieldG is a data frame that includes all players who played 90 or more games in the years 2010-2010 and all of their fielding data. BatG is the same but for player’s batting data.
This plot shows that the fewer errors a player has in the field, their fielding percentage goes up drastically. This is greatly expected as the primary reason for a fielding percentage to drop is errors.
This plot shows that the more home runs a player hits in a season, the higher their batting average is. The surety that these have a relationship on each other is less sure than the graph above.
This first model shows that fielding percentage as a function of errors made is extremely significant with a p-value of 2.0e-16.
## Df Sum Sq Mean Sq F value Pr(>F)
## E 1 0.2307 0.2307 2398 <2e-16 ***
## Residuals 2635 0.2535 0.0001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This second model shows that the batting average as a function of home runs hit is also extremely significant with a p-value of 2.0e-16.
## Df Sum Sq Mean Sq F value Pr(>F)
## HR 1 0.1317 0.13169 153.3 <2e-16 ***
## Residuals 2911 2.5007 0.00086
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By looking at the p-values, it can be interpretted that these two models are almost identical. The influence that errors have on a player’s fielding percentage is the same significance as the amount of home runs a player hits has on their batting average.
In conclusion, I regret to inform that batting average and fielding percentage do not have any impact on each other. Through modeling, I was able to find two very important factors to determining a player’s batting average and fielding percentage, home runs and errors, respectively.